Overview

Dataset statistics

 Dataset ADataset B
Number of variables1212
Number of observations446446
Missing cells426443
Missing cells (%)8.0%8.3%
Duplicate rows00
Duplicate rows (%)0.0%0.0%
Total size in memory45.3 KiB45.3 KiB
Average record size in memory104.0 B104.0 B

Variable types

 Dataset ADataset B
Numeric55
Categorical44
Text33

Alerts

Dataset ADataset B
Age has 81 (18.2%) missing values Age has 88 (19.7%) missing values Missing
Cabin has 344 (77.1%) missing values Cabin has 353 (79.1%) missing values Missing
PassengerId has unique values PassengerId has unique values Unique
Name has unique values Name has unique values Unique
SibSp has 303 (67.9%) zeros SibSp has 300 (67.3%) zeros Zeros
Parch has 337 (75.6%) zeros Parch has 337 (75.6%) zeros Zeros
Fare has 7 (1.6%) zeros Fare has 8 (1.8%) zeros Zeros

Reproduction

 Dataset ADataset B
Analysis started2024-03-26 15:35:46.9254692024-03-26 15:35:50.878274
Analysis finished2024-03-26 15:35:50.8771182024-03-26 15:35:54.036519
Duration3.95 seconds3.16 seconds
Software versionydata-profiling v0.0.dev0ydata-profiling v0.0.dev0
Download configurationconfig.jsonconfig.json

Variables

PassengerId
Real number (ℝ)

 Dataset ADataset B
Distinct446446
Distinct (%)100.0%100.0%
Missing00
Missing (%)0.0%0.0%
Infinite00
Infinite (%)0.0%0.0%
Mean434.59865450.8722
 Dataset ADataset B
Minimum21
Maximum891891
Zeros00
Zeros (%)0.0%0.0%
Negative00
Negative (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2024-03-26T15:35:54.172301image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Quantile statistics

 Dataset ADataset B
Minimum21
5-th percentile45.540
Q1184.5226.25
median440459.5
Q3664.75679.75
95-th percentile847.75846
Maximum891891
Range889890
Interquartile range (IQR)480.25453.5

Descriptive statistics

 Dataset ADataset B
Standard deviation264.52706259.52334
Coefficient of variation (CV)0.608669760.57560289
Kurtosis-1.2862288-1.2357739
Mean434.59865450.8722
Median Absolute Deviation (MAD)239.5229.5
Skewness0.046691704-0.062391757
Sum193831201089
Variance69974.56467352.363
MonotonicityNot monotonicNot monotonic
2024-03-26T15:35:54.373921image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
733 1
 
0.2%
508 1
 
0.2%
153 1
 
0.2%
653 1
 
0.2%
628 1
 
0.2%
497 1
 
0.2%
488 1
 
0.2%
704 1
 
0.2%
442 1
 
0.2%
280 1
 
0.2%
Other values (436) 436
97.8%
ValueCountFrequency (%)
645 1
 
0.2%
469 1
 
0.2%
779 1
 
0.2%
253 1
 
0.2%
397 1
 
0.2%
318 1
 
0.2%
777 1
 
0.2%
437 1
 
0.2%
643 1
 
0.2%
71 1
 
0.2%
Other values (436) 436
97.8%
ValueCountFrequency (%)
2 1
0.2%
5 1
0.2%
11 1
0.2%
12 1
0.2%
14 1
0.2%
15 1
0.2%
18 1
0.2%
20 1
0.2%
21 1
0.2%
22 1
0.2%
ValueCountFrequency (%)
1 1
0.2%
2 1
0.2%
5 1
0.2%
8 1
0.2%
10 1
0.2%
11 1
0.2%
12 1
0.2%
14 1
0.2%
18 1
0.2%
20 1
0.2%
ValueCountFrequency (%)
1 1
0.2%
2 1
0.2%
5 1
0.2%
8 1
0.2%
10 1
0.2%
11 1
0.2%
12 1
0.2%
14 1
0.2%
18 1
0.2%
20 1
0.2%
ValueCountFrequency (%)
2 1
0.2%
5 1
0.2%
11 1
0.2%
12 1
0.2%
14 1
0.2%
15 1
0.2%
18 1
0.2%
20 1
0.2%
21 1
0.2%
22 1
0.2%

Survived
Categorical

 Dataset ADataset B
Distinct22
Distinct (%)0.4%0.4%
Missing00
Missing (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
0
271 
1
175 
0
280 
1
166 

Length

 Dataset ADataset B
Max length11
Median length11
Mean length11
Min length11

Characters and Unicode

 Dataset ADataset B
Total characters446446
Distinct characters22
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique00 ?
Unique (%)0.0%0.0%

Sample

 Dataset ADataset B
1st row01
2nd row01
3rd row00
4th row00
5th row00

Common Values

ValueCountFrequency (%)
0 271
60.8%
1 175
39.2%
ValueCountFrequency (%)
0 280
62.8%
1 166
37.2%

Length

2024-03-26T15:35:54.632150image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

Dataset A

2024-03-26T15:35:54.738219image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Dataset B

2024-03-26T15:35:54.838896image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
ValueCountFrequency (%)
0 271
60.8%
1 175
39.2%
ValueCountFrequency (%)
0 280
62.8%
1 166
37.2%

Most occurring characters

ValueCountFrequency (%)
0 271
60.8%
1 175
39.2%
ValueCountFrequency (%)
0 280
62.8%
1 166
37.2%

Most occurring categories

ValueCountFrequency (%)
(unknown) 446
100.0%
ValueCountFrequency (%)
(unknown) 446
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
0 271
60.8%
1 175
39.2%
ValueCountFrequency (%)
0 280
62.8%
1 166
37.2%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 446
100.0%
ValueCountFrequency (%)
(unknown) 446
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
0 271
60.8%
1 175
39.2%
ValueCountFrequency (%)
0 280
62.8%
1 166
37.2%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 446
100.0%
ValueCountFrequency (%)
(unknown) 446
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
0 271
60.8%
1 175
39.2%
ValueCountFrequency (%)
0 280
62.8%
1 166
37.2%

Pclass
Categorical

 Dataset ADataset B
Distinct33
Distinct (%)0.7%0.7%
Missing00
Missing (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
3
241 
1
113 
2
92 
3
253 
2
98 
1
95 

Length

 Dataset ADataset B
Max length11
Median length11
Mean length11
Min length11

Characters and Unicode

 Dataset ADataset B
Total characters446446
Distinct characters33
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique00 ?
Unique (%)0.0%0.0%

Sample

 Dataset ADataset B
1st row23
2nd row21
3rd row12
4th row32
5th row22

Common Values

ValueCountFrequency (%)
3 241
54.0%
1 113
25.3%
2 92
 
20.6%
ValueCountFrequency (%)
3 253
56.7%
2 98
 
22.0%
1 95
 
21.3%

Length

2024-03-26T15:35:54.950746image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

Dataset A

2024-03-26T15:35:55.061152image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Dataset B

2024-03-26T15:35:55.171798image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
ValueCountFrequency (%)
3 241
54.0%
1 113
25.3%
2 92
 
20.6%
ValueCountFrequency (%)
3 253
56.7%
2 98
 
22.0%
1 95
 
21.3%

Most occurring characters

ValueCountFrequency (%)
3 241
54.0%
1 113
25.3%
2 92
 
20.6%
ValueCountFrequency (%)
3 253
56.7%
2 98
 
22.0%
1 95
 
21.3%

Most occurring categories

ValueCountFrequency (%)
(unknown) 446
100.0%
ValueCountFrequency (%)
(unknown) 446
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
3 241
54.0%
1 113
25.3%
2 92
 
20.6%
ValueCountFrequency (%)
3 253
56.7%
2 98
 
22.0%
1 95
 
21.3%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 446
100.0%
ValueCountFrequency (%)
(unknown) 446
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
3 241
54.0%
1 113
25.3%
2 92
 
20.6%
ValueCountFrequency (%)
3 253
56.7%
2 98
 
22.0%
1 95
 
21.3%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 446
100.0%
ValueCountFrequency (%)
(unknown) 446
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
3 241
54.0%
1 113
25.3%
2 92
 
20.6%
ValueCountFrequency (%)
3 253
56.7%
2 98
 
22.0%
1 95
 
21.3%

Name
['Text', 'Text']

 Dataset ADataset B
Distinct446446
Distinct (%)100.0%100.0%
Missing00
Missing (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2024-03-26T15:35:55.590037image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Length

 Dataset ADataset B
Max length8267
Median length5147
Mean length27.21524726.540359
Min length1312

Characters and Unicode

 Dataset ADataset B
Total characters1213811837
Distinct characters6060
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique446446 ?
Unique (%)100.0%100.0%

Sample

 Dataset ADataset B
1st rowKnight, Mr. Robert JBaclini, Miss. Eugenie
2nd rowKirkland, Rev. Charles LeonardDodge, Master. Washington
3rd rowMinahan, Dr. William EdwardCampbell, Mr. William
4th rowDantcheff, Mr. RistiuReeves, Mr. David
5th rowRenouf, Mr. Peter HenryCollyer, Mr. Harvey
ValueCountFrequency (%)
mr 260
 
14.2%
miss 88
 
4.8%
mrs 68
 
3.7%
william 27
 
1.5%
john 24
 
1.3%
master 19
 
1.0%
henry 18
 
1.0%
george 15
 
0.8%
charles 13
 
0.7%
elizabeth 13
 
0.7%
Other values (892) 1284
70.2%
ValueCountFrequency (%)
mr 270
 
15.2%
miss 88
 
4.9%
mrs 59
 
3.3%
william 36
 
2.0%
john 20
 
1.1%
master 19
 
1.1%
henry 15
 
0.8%
charles 13
 
0.7%
anna 12
 
0.7%
james 12
 
0.7%
Other values (875) 1234
69.4%
2024-03-26T15:35:56.299506image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1384
 
11.4%
r 997
 
8.2%
e 870
 
7.2%
a 827
 
6.8%
n 669
 
5.5%
i 653
 
5.4%
s 641
 
5.3%
M 561
 
4.6%
l 533
 
4.4%
o 503
 
4.1%
Other values (50) 4500
37.1%
ValueCountFrequency (%)
1333
 
11.3%
r 983
 
8.3%
e 850
 
7.2%
a 813
 
6.9%
n 679
 
5.7%
i 669
 
5.7%
s 637
 
5.4%
M 554
 
4.7%
l 527
 
4.5%
o 476
 
4.0%
Other values (50) 4316
36.5%

Most occurring categories

ValueCountFrequency (%)
(unknown) 12138
100.0%
ValueCountFrequency (%)
(unknown) 11837
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
1384
 
11.4%
r 997
 
8.2%
e 870
 
7.2%
a 827
 
6.8%
n 669
 
5.5%
i 653
 
5.4%
s 641
 
5.3%
M 561
 
4.6%
l 533
 
4.4%
o 503
 
4.1%
Other values (50) 4500
37.1%
ValueCountFrequency (%)
1333
 
11.3%
r 983
 
8.3%
e 850
 
7.2%
a 813
 
6.9%
n 679
 
5.7%
i 669
 
5.7%
s 637
 
5.4%
M 554
 
4.7%
l 527
 
4.5%
o 476
 
4.0%
Other values (50) 4316
36.5%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 12138
100.0%
ValueCountFrequency (%)
(unknown) 11837
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
1384
 
11.4%
r 997
 
8.2%
e 870
 
7.2%
a 827
 
6.8%
n 669
 
5.5%
i 653
 
5.4%
s 641
 
5.3%
M 561
 
4.6%
l 533
 
4.4%
o 503
 
4.1%
Other values (50) 4500
37.1%
ValueCountFrequency (%)
1333
 
11.3%
r 983
 
8.3%
e 850
 
7.2%
a 813
 
6.9%
n 679
 
5.7%
i 669
 
5.7%
s 637
 
5.4%
M 554
 
4.7%
l 527
 
4.5%
o 476
 
4.0%
Other values (50) 4316
36.5%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 12138
100.0%
ValueCountFrequency (%)
(unknown) 11837
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
1384
 
11.4%
r 997
 
8.2%
e 870
 
7.2%
a 827
 
6.8%
n 669
 
5.5%
i 653
 
5.4%
s 641
 
5.3%
M 561
 
4.6%
l 533
 
4.4%
o 503
 
4.1%
Other values (50) 4500
37.1%
ValueCountFrequency (%)
1333
 
11.3%
r 983
 
8.3%
e 850
 
7.2%
a 813
 
6.9%
n 679
 
5.7%
i 669
 
5.7%
s 637
 
5.4%
M 554
 
4.7%
l 527
 
4.5%
o 476
 
4.0%
Other values (50) 4316
36.5%

Sex
Categorical

 Dataset ADataset B
Distinct22
Distinct (%)0.4%0.4%
Missing00
Missing (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
male
289 
female
157 
male
297 
female
149 

Length

 Dataset ADataset B
Max length66
Median length44
Mean length4.70403594.6681614
Min length44

Characters and Unicode

 Dataset ADataset B
Total characters20982082
Distinct characters55
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique00 ?
Unique (%)0.0%0.0%

Sample

 Dataset ADataset B
1st rowmalefemale
2nd rowmalemale
3rd rowmalemale
4th rowmalemale
5th rowmalemale

Common Values

ValueCountFrequency (%)
male 289
64.8%
female 157
35.2%
ValueCountFrequency (%)
male 297
66.6%
female 149
33.4%

Length

2024-03-26T15:35:56.499192image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

Dataset A

2024-03-26T15:35:56.620396image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Dataset B

2024-03-26T15:35:56.721698image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
ValueCountFrequency (%)
male 289
64.8%
female 157
35.2%
ValueCountFrequency (%)
male 297
66.6%
female 149
33.4%

Most occurring characters

ValueCountFrequency (%)
e 603
28.7%
m 446
21.3%
a 446
21.3%
l 446
21.3%
f 157
 
7.5%
ValueCountFrequency (%)
e 595
28.6%
m 446
21.4%
a 446
21.4%
l 446
21.4%
f 149
 
7.2%

Most occurring categories

ValueCountFrequency (%)
(unknown) 2098
100.0%
ValueCountFrequency (%)
(unknown) 2082
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
e 603
28.7%
m 446
21.3%
a 446
21.3%
l 446
21.3%
f 157
 
7.5%
ValueCountFrequency (%)
e 595
28.6%
m 446
21.4%
a 446
21.4%
l 446
21.4%
f 149
 
7.2%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 2098
100.0%
ValueCountFrequency (%)
(unknown) 2082
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
e 603
28.7%
m 446
21.3%
a 446
21.3%
l 446
21.3%
f 157
 
7.5%
ValueCountFrequency (%)
e 595
28.6%
m 446
21.4%
a 446
21.4%
l 446
21.4%
f 149
 
7.2%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 2098
100.0%
ValueCountFrequency (%)
(unknown) 2082
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
e 603
28.7%
m 446
21.3%
a 446
21.3%
l 446
21.3%
f 157
 
7.5%
ValueCountFrequency (%)
e 595
28.6%
m 446
21.4%
a 446
21.4%
l 446
21.4%
f 149
 
7.2%

Age
Real number (ℝ)

 Dataset ADataset B
Distinct7376
Distinct (%)20.0%21.2%
Missing8188
Missing (%)18.2%19.7%
Infinite00
Infinite (%)0.0%0.0%
Mean29.67123330.378492
 Dataset ADataset B
Minimum0.750.75
Maximum8080
Zeros00
Zeros (%)0.0%0.0%
Negative00
Negative (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2024-03-26T15:35:56.881385image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Quantile statistics

 Dataset ADataset B
Minimum0.750.75
5-th percentile54
Q12121
median2929
Q33739
95-th percentile55.459.15
Maximum8080
Range79.2579.25
Interquartile range (IQR)1618

Descriptive statistics

 Dataset ADataset B
Standard deviation14.1726415.048925
Coefficient of variation (CV)0.477655920.49538091
Kurtosis0.424997760.062886566
Mean29.67123330.378492
Median Absolute Deviation (MAD)89
Skewness0.437376030.39996082
Sum1083010875.5
Variance200.86372226.47014
MonotonicityNot monotonicNot monotonic
2024-03-26T15:35:57.097152image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
28 18
 
4.0%
24 16
 
3.6%
36 14
 
3.1%
29 13
 
2.9%
30 12
 
2.7%
22 12
 
2.7%
23 12
 
2.7%
35 12
 
2.7%
17 11
 
2.5%
32 10
 
2.2%
Other values (63) 235
52.7%
(Missing) 81
 
18.2%
ValueCountFrequency (%)
24 17
 
3.8%
21 16
 
3.6%
28 13
 
2.9%
36 12
 
2.7%
29 12
 
2.7%
18 12
 
2.7%
22 12
 
2.7%
19 12
 
2.7%
32 11
 
2.5%
27 11
 
2.5%
Other values (66) 230
51.6%
(Missing) 88
 
19.7%
ValueCountFrequency (%)
0.75 1
 
0.2%
0.83 1
 
0.2%
0.92 1
 
0.2%
1 3
0.7%
2 3
0.7%
3 3
0.7%
4 6
1.3%
5 2
 
0.4%
7 3
0.7%
8 2
 
0.4%
ValueCountFrequency (%)
0.75 2
 
0.4%
1 2
 
0.4%
2 6
1.3%
3 2
 
0.4%
4 7
1.6%
5 2
 
0.4%
6 3
0.7%
7 3
0.7%
8 3
0.7%
9 3
0.7%
ValueCountFrequency (%)
0.75 2
 
0.4%
1 2
 
0.4%
2 6
1.3%
3 2
 
0.4%
4 7
1.6%
5 2
 
0.4%
6 3
0.7%
7 3
0.7%
8 3
0.7%
9 3
0.7%
ValueCountFrequency (%)
0.75 1
 
0.2%
0.83 1
 
0.2%
0.92 1
 
0.2%
1 3
0.7%
2 3
0.7%
3 3
0.7%
4 6
1.3%
5 2
 
0.4%
7 3
0.7%
8 2
 
0.4%

SibSp
Real number (ℝ)

 Dataset ADataset B
Distinct77
Distinct (%)1.6%1.6%
Missing00
Missing (%)0.0%0.0%
Infinite00
Infinite (%)0.0%0.0%
Mean0.585201790.57623318
 Dataset ADataset B
Minimum00
Maximum88
Zeros303300
Zeros (%)67.9%67.3%
Negative00
Negative (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2024-03-26T15:35:57.250688image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Quantile statistics

 Dataset ADataset B
Minimum00
5-th percentile00
Q100
median00
Q311
95-th percentile33
Maximum88
Range88
Interquartile range (IQR)11

Descriptive statistics

 Dataset ADataset B
Standard deviation1.24364661.2094275
Coefficient of variation (CV)2.12515862.0988508
Kurtosis14.76528116.092636
Mean0.585201790.57623318
Median Absolute Deviation (MAD)00
Skewness3.47288163.5759277
Sum261257
Variance1.54665691.4627148
MonotonicityNot monotonicNot monotonic
2024-03-26T15:35:57.373012image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
Histogram with fixed size bins (bins=7)
ValueCountFrequency (%)
0 303
67.9%
1 98
 
22.0%
2 16
 
3.6%
3 10
 
2.2%
4 9
 
2.0%
8 5
 
1.1%
5 5
 
1.1%
ValueCountFrequency (%)
0 300
67.3%
1 102
 
22.9%
2 18
 
4.0%
4 12
 
2.7%
3 7
 
1.6%
8 5
 
1.1%
5 2
 
0.4%
ValueCountFrequency (%)
0 303
67.9%
1 98
 
22.0%
2 16
 
3.6%
3 10
 
2.2%
4 9
 
2.0%
5 5
 
1.1%
8 5
 
1.1%
ValueCountFrequency (%)
0 300
67.3%
1 102
 
22.9%
2 18
 
4.0%
3 7
 
1.6%
4 12
 
2.7%
5 2
 
0.4%
8 5
 
1.1%
ValueCountFrequency (%)
0 300
67.3%
1 102
 
22.9%
2 18
 
4.0%
3 7
 
1.6%
4 12
 
2.7%
5 2
 
0.4%
8 5
 
1.1%
ValueCountFrequency (%)
0 303
67.9%
1 98
 
22.0%
2 16
 
3.6%
3 10
 
2.2%
4 9
 
2.0%
5 5
 
1.1%
8 5
 
1.1%

Parch
Real number (ℝ)

 Dataset ADataset B
Distinct77
Distinct (%)1.6%1.6%
Missing00
Missing (%)0.0%0.0%
Infinite00
Infinite (%)0.0%0.0%
Mean0.401345290.38789238
 Dataset ADataset B
Minimum00
Maximum66
Zeros337337
Zeros (%)75.6%75.6%
Negative00
Negative (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2024-03-26T15:35:57.491121image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Quantile statistics

 Dataset ADataset B
Minimum00
5-th percentile00
Q100
median00
Q300
95-th percentile22
Maximum66
Range66
Interquartile range (IQR)00

Descriptive statistics

 Dataset ADataset B
Standard deviation0.84155980.82909809
Coefficient of variation (CV)2.09684732.1374436
Kurtosis10.29467311.679163
Mean0.401345290.38789238
Median Absolute Deviation (MAD)00
Skewness2.79016562.9745314
Sum179173
Variance0.708222910.68740364
MonotonicityNot monotonicNot monotonic
2024-03-26T15:35:57.608295image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
Histogram with fixed size bins (bins=7)
ValueCountFrequency (%)
0 337
75.6%
1 57
 
12.8%
2 44
 
9.9%
5 3
 
0.7%
3 3
 
0.7%
4 1
 
0.2%
6 1
 
0.2%
ValueCountFrequency (%)
0 337
75.6%
1 64
 
14.3%
2 37
 
8.3%
5 3
 
0.7%
4 2
 
0.4%
3 2
 
0.4%
6 1
 
0.2%
ValueCountFrequency (%)
0 337
75.6%
1 57
 
12.8%
2 44
 
9.9%
3 3
 
0.7%
4 1
 
0.2%
5 3
 
0.7%
6 1
 
0.2%
ValueCountFrequency (%)
0 337
75.6%
1 64
 
14.3%
2 37
 
8.3%
3 2
 
0.4%
4 2
 
0.4%
5 3
 
0.7%
6 1
 
0.2%
ValueCountFrequency (%)
0 337
75.6%
1 64
 
14.3%
2 37
 
8.3%
3 2
 
0.4%
4 2
 
0.4%
5 3
 
0.7%
6 1
 
0.2%
ValueCountFrequency (%)
0 337
75.6%
1 57
 
12.8%
2 44
 
9.9%
3 3
 
0.7%
4 1
 
0.2%
5 3
 
0.7%
6 1
 
0.2%

Ticket
['Text', 'Text']

 Dataset ADataset B
Distinct388380
Distinct (%)87.0%85.2%
Missing00
Missing (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2024-03-26T15:35:58.232451image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Length

 Dataset ADataset B
Max length1818
Median length1717
Mean length6.57174896.8654709
Min length43

Characters and Unicode

 Dataset ADataset B
Total characters29313062
Distinct characters3131
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique347335 ?
Unique (%)77.8%75.1%

Sample

 Dataset ADataset B
1st row2398552666
2nd row21953333638
3rd row19928239853
4th row349203C.A. 17248
5th row31027C.A. 31921
ValueCountFrequency (%)
pc 36
 
6.4%
ca 12
 
2.1%
c.a 9
 
1.6%
a/5 8
 
1.4%
2144 6
 
1.1%
sc/paris 6
 
1.1%
2343 5
 
0.9%
1601 5
 
0.9%
w./c 5
 
0.9%
347082 4
 
0.7%
Other values (406) 464
82.9%
ValueCountFrequency (%)
pc 23
 
4.0%
c.a 12
 
2.1%
ca 8
 
1.4%
2 8
 
1.4%
ston/o 8
 
1.4%
a/5 8
 
1.4%
sc/paris 6
 
1.1%
2343 5
 
0.9%
f.c.c 5
 
0.9%
w./c 4
 
0.7%
Other values (401) 483
84.7%
2024-03-26T15:35:58.943286image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
3 367
12.5%
1 339
11.6%
2 287
9.8%
7 259
8.8%
4 252
8.6%
0 202
 
6.9%
5 198
 
6.8%
6 195
 
6.7%
9 151
 
5.2%
8 143
 
4.9%
Other values (21) 538
18.4%
ValueCountFrequency (%)
3 394
12.9%
1 348
11.4%
2 289
9.4%
7 244
 
8.0%
4 241
 
7.9%
6 202
 
6.6%
0 201
 
6.6%
5 197
 
6.4%
9 167
 
5.5%
8 136
 
4.4%
Other values (21) 643
21.0%

Most occurring categories

ValueCountFrequency (%)
(unknown) 2931
100.0%
ValueCountFrequency (%)
(unknown) 3062
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
3 367
12.5%
1 339
11.6%
2 287
9.8%
7 259
8.8%
4 252
8.6%
0 202
 
6.9%
5 198
 
6.8%
6 195
 
6.7%
9 151
 
5.2%
8 143
 
4.9%
Other values (21) 538
18.4%
ValueCountFrequency (%)
3 394
12.9%
1 348
11.4%
2 289
9.4%
7 244
 
8.0%
4 241
 
7.9%
6 202
 
6.6%
0 201
 
6.6%
5 197
 
6.4%
9 167
 
5.5%
8 136
 
4.4%
Other values (21) 643
21.0%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 2931
100.0%
ValueCountFrequency (%)
(unknown) 3062
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
3 367
12.5%
1 339
11.6%
2 287
9.8%
7 259
8.8%
4 252
8.6%
0 202
 
6.9%
5 198
 
6.8%
6 195
 
6.7%
9 151
 
5.2%
8 143
 
4.9%
Other values (21) 538
18.4%
ValueCountFrequency (%)
3 394
12.9%
1 348
11.4%
2 289
9.4%
7 244
 
8.0%
4 241
 
7.9%
6 202
 
6.6%
0 201
 
6.6%
5 197
 
6.4%
9 167
 
5.5%
8 136
 
4.4%
Other values (21) 643
21.0%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 2931
100.0%
ValueCountFrequency (%)
(unknown) 3062
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
3 367
12.5%
1 339
11.6%
2 287
9.8%
7 259
8.8%
4 252
8.6%
0 202
 
6.9%
5 198
 
6.8%
6 195
 
6.7%
9 151
 
5.2%
8 143
 
4.9%
Other values (21) 538
18.4%
ValueCountFrequency (%)
3 394
12.9%
1 348
11.4%
2 289
9.4%
7 244
 
8.0%
4 241
 
7.9%
6 202
 
6.6%
0 201
 
6.6%
5 197
 
6.4%
9 167
 
5.5%
8 136
 
4.4%
Other values (21) 643
21.0%

Fare
Real number (ℝ)

 Dataset ADataset B
Distinct193178
Distinct (%)43.3%39.9%
Missing00
Missing (%)0.0%0.0%
Infinite00
Infinite (%)0.0%0.0%
Mean32.04821430.853717
 Dataset ADataset B
Minimum00
Maximum512.3292512.3292
Zeros78
Zeros (%)1.6%1.8%
Negative00
Negative (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2024-03-26T15:35:59.173654image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Quantile statistics

 Dataset ADataset B
Minimum00
5-th percentile7.2257.162525
Q17.9257.8958
median15.022913.5
Q332.830230.0708
95-th percentile110.883391.0792
Maximum512.3292512.3292
Range512.3292512.3292
Interquartile range (IQR)24.905222.175

Descriptive statistics

 Dataset ADataset B
Standard deviation46.03686150.011745
Coefficient of variation (CV)1.43648761.620931
Kurtosis32.47324641.650934
Mean32.04821430.853717
Median Absolute Deviation (MAD)7.77296.25
Skewness4.58977465.457171
Sum14293.50413760.758
Variance2119.39262501.1747
MonotonicityNot monotonicNot monotonic
2024-03-26T15:35:59.382212image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
13 23
 
5.2%
7.8958 19
 
4.3%
8.05 19
 
4.3%
26 16
 
3.6%
7.75 15
 
3.4%
10.5 9
 
2.0%
26.55 8
 
1.8%
0 7
 
1.6%
7.225 7
 
1.6%
7.925 7
 
1.6%
Other values (183) 316
70.9%
ValueCountFrequency (%)
8.05 21
 
4.7%
7.75 21
 
4.7%
13 20
 
4.5%
7.8958 18
 
4.0%
26 15
 
3.4%
10.5 13
 
2.9%
7.925 11
 
2.5%
7.25 9
 
2.0%
0 8
 
1.8%
26.55 7
 
1.6%
Other values (168) 303
67.9%
ValueCountFrequency (%)
0 7
1.6%
6.75 2
 
0.4%
6.8583 1
 
0.2%
6.95 1
 
0.2%
6.975 2
 
0.4%
7.0458 1
 
0.2%
7.05 3
0.7%
7.0542 1
 
0.2%
7.125 1
 
0.2%
7.1417 1
 
0.2%
ValueCountFrequency (%)
0 8
1.8%
6.2375 1
 
0.2%
6.45 1
 
0.2%
6.4958 1
 
0.2%
6.8583 1
 
0.2%
6.975 2
 
0.4%
7.0458 1
 
0.2%
7.05 2
 
0.4%
7.0542 2
 
0.4%
7.125 3
 
0.7%
ValueCountFrequency (%)
0 8
1.8%
6.2375 1
 
0.2%
6.45 1
 
0.2%
6.4958 1
 
0.2%
6.8583 1
 
0.2%
6.975 2
 
0.4%
7.0458 1
 
0.2%
7.05 2
 
0.4%
7.0542 2
 
0.4%
7.125 3
 
0.7%
ValueCountFrequency (%)
0 7
1.6%
6.75 2
 
0.4%
6.8583 1
 
0.2%
6.95 1
 
0.2%
6.975 2
 
0.4%
7.0458 1
 
0.2%
7.05 3
0.7%
7.0542 1
 
0.2%
7.125 1
 
0.2%
7.1417 1
 
0.2%

Cabin
['Text', 'Text']

 Dataset ADataset B
Distinct8677
Distinct (%)84.3%82.8%
Missing344353
Missing (%)77.1%79.1%
Memory size7.0 KiB7.0 KiB
2024-03-26T15:35:59.815008image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Length

 Dataset ADataset B
Max length1115
Median length33
Mean length3.34313733.4516129
Min length11

Characters and Unicode

 Dataset ADataset B
Total characters341321
Distinct characters1818
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique7162 ?
Unique (%)69.6%66.7%

Sample

 Dataset ADataset B
1st rowC78A34
2nd rowB20C93
3rd rowD36C46
4th rowB35B28
5th rowC2E67
ValueCountFrequency (%)
b96 3
 
2.6%
b98 3
 
2.6%
f4 2
 
1.8%
b35 2
 
1.8%
b20 2
 
1.8%
c2 2
 
1.8%
g6 2
 
1.8%
g73 2
 
1.8%
f 2
 
1.8%
b5 2
 
1.8%
Other values (82) 92
80.7%
ValueCountFrequency (%)
g6 3
 
2.9%
b5 2
 
1.9%
c68 2
 
1.9%
b28 2
 
1.9%
b98 2
 
1.9%
b96 2
 
1.9%
f 2
 
1.9%
d35 2
 
1.9%
b49 2
 
1.9%
e67 2
 
1.9%
Other values (77) 84
80.0%
2024-03-26T15:36:00.426568image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
C 37
10.9%
2 35
 
10.3%
3 32
 
9.4%
B 27
 
7.9%
5 24
 
7.0%
0 22
 
6.5%
1 22
 
6.5%
6 20
 
5.9%
8 17
 
5.0%
9 16
 
4.7%
Other values (8) 89
26.1%
ValueCountFrequency (%)
1 35
10.9%
B 31
 
9.7%
2 26
 
8.1%
6 25
 
7.8%
3 25
 
7.8%
C 20
 
6.2%
E 20
 
6.2%
5 20
 
6.2%
4 18
 
5.6%
D 15
 
4.7%
Other values (8) 86
26.8%

Most occurring categories

ValueCountFrequency (%)
(unknown) 341
100.0%
ValueCountFrequency (%)
(unknown) 321
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
C 37
10.9%
2 35
 
10.3%
3 32
 
9.4%
B 27
 
7.9%
5 24
 
7.0%
0 22
 
6.5%
1 22
 
6.5%
6 20
 
5.9%
8 17
 
5.0%
9 16
 
4.7%
Other values (8) 89
26.1%
ValueCountFrequency (%)
1 35
10.9%
B 31
 
9.7%
2 26
 
8.1%
6 25
 
7.8%
3 25
 
7.8%
C 20
 
6.2%
E 20
 
6.2%
5 20
 
6.2%
4 18
 
5.6%
D 15
 
4.7%
Other values (8) 86
26.8%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 341
100.0%
ValueCountFrequency (%)
(unknown) 321
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
C 37
10.9%
2 35
 
10.3%
3 32
 
9.4%
B 27
 
7.9%
5 24
 
7.0%
0 22
 
6.5%
1 22
 
6.5%
6 20
 
5.9%
8 17
 
5.0%
9 16
 
4.7%
Other values (8) 89
26.1%
ValueCountFrequency (%)
1 35
10.9%
B 31
 
9.7%
2 26
 
8.1%
6 25
 
7.8%
3 25
 
7.8%
C 20
 
6.2%
E 20
 
6.2%
5 20
 
6.2%
4 18
 
5.6%
D 15
 
4.7%
Other values (8) 86
26.8%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 341
100.0%
ValueCountFrequency (%)
(unknown) 321
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
C 37
10.9%
2 35
 
10.3%
3 32
 
9.4%
B 27
 
7.9%
5 24
 
7.0%
0 22
 
6.5%
1 22
 
6.5%
6 20
 
5.9%
8 17
 
5.0%
9 16
 
4.7%
Other values (8) 89
26.1%
ValueCountFrequency (%)
1 35
10.9%
B 31
 
9.7%
2 26
 
8.1%
6 25
 
7.8%
3 25
 
7.8%
C 20
 
6.2%
E 20
 
6.2%
5 20
 
6.2%
4 18
 
5.6%
D 15
 
4.7%
Other values (8) 86
26.8%

Embarked
Categorical

 Dataset ADataset B
Distinct33
Distinct (%)0.7%0.7%
Missing12
Missing (%)0.2%0.4%
Memory size7.0 KiB7.0 KiB
S
325 
C
82 
Q
38 
S
329 
C
75 
Q
40 

Length

 Dataset ADataset B
Max length11
Median length11
Mean length11
Min length11

Characters and Unicode

 Dataset ADataset B
Total characters445444
Distinct characters33
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique00 ?
Unique (%)0.0%0.0%

Sample

 Dataset ADataset B
1st rowSC
2nd rowQS
3rd rowQS
4th rowSS
5th rowSS

Common Values

ValueCountFrequency (%)
S 325
72.9%
C 82
 
18.4%
Q 38
 
8.5%
(Missing) 1
 
0.2%
ValueCountFrequency (%)
S 329
73.8%
C 75
 
16.8%
Q 40
 
9.0%
(Missing) 2
 
0.4%

Length

2024-03-26T15:36:00.607822image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

Dataset A

2024-03-26T15:36:00.716449image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Dataset B

2024-03-26T15:36:00.827254image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
ValueCountFrequency (%)
s 325
73.0%
c 82
 
18.4%
q 38
 
8.5%
ValueCountFrequency (%)
s 329
74.1%
c 75
 
16.9%
q 40
 
9.0%

Most occurring characters

ValueCountFrequency (%)
S 325
73.0%
C 82
 
18.4%
Q 38
 
8.5%
ValueCountFrequency (%)
S 329
74.1%
C 75
 
16.9%
Q 40
 
9.0%

Most occurring categories

ValueCountFrequency (%)
(unknown) 445
100.0%
ValueCountFrequency (%)
(unknown) 444
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
S 325
73.0%
C 82
 
18.4%
Q 38
 
8.5%
ValueCountFrequency (%)
S 329
74.1%
C 75
 
16.9%
Q 40
 
9.0%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 445
100.0%
ValueCountFrequency (%)
(unknown) 444
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
S 325
73.0%
C 82
 
18.4%
Q 38
 
8.5%
ValueCountFrequency (%)
S 329
74.1%
C 75
 
16.9%
Q 40
 
9.0%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 445
100.0%
ValueCountFrequency (%)
(unknown) 444
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
S 325
73.0%
C 82
 
18.4%
Q 38
 
8.5%
ValueCountFrequency (%)
S 329
74.1%
C 75
 
16.9%
Q 40
 
9.0%

Interactions

Dataset A

2024-03-26T15:35:49.748270image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Dataset B

2024-03-26T15:35:53.212878image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Dataset A

2024-03-26T15:35:47.093328image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Dataset B

2024-03-26T15:35:50.992222image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Dataset A

2024-03-26T15:35:47.728245image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Dataset B

2024-03-26T15:35:51.599965image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Dataset A

2024-03-26T15:35:48.371938image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Dataset B

2024-03-26T15:35:52.157550image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Dataset A

2024-03-26T15:35:49.115772image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Dataset B

2024-03-26T15:35:52.741487image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Dataset A

2024-03-26T15:35:49.863132image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Dataset B

2024-03-26T15:35:53.299403image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Dataset A

2024-03-26T15:35:47.210880image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Dataset B

2024-03-26T15:35:51.095060image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Dataset A

2024-03-26T15:35:47.851046image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Dataset B

2024-03-26T15:35:51.722387image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Dataset A

2024-03-26T15:35:48.495512image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Dataset B

2024-03-26T15:35:52.248837image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Dataset A

2024-03-26T15:35:49.231374image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Dataset B

2024-03-26T15:35:52.827693image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Dataset A

2024-03-26T15:35:49.995455image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Dataset B

2024-03-26T15:35:53.396016image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Dataset A

2024-03-26T15:35:47.348045image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Dataset B

2024-03-26T15:35:51.226997image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Dataset A

2024-03-26T15:35:47.988535image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Dataset B

2024-03-26T15:35:51.860323image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Dataset A

2024-03-26T15:35:48.715492image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Dataset B

2024-03-26T15:35:52.448547image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Dataset A

2024-03-26T15:35:49.368656image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Dataset B

2024-03-26T15:35:52.931338image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Dataset A

2024-03-26T15:35:50.132883image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Dataset B

2024-03-26T15:35:53.494906image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Dataset A

2024-03-26T15:35:47.485742image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Dataset B

2024-03-26T15:35:51.359998image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Dataset A

2024-03-26T15:35:48.114524image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Dataset B

2024-03-26T15:35:51.965524image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Dataset A

2024-03-26T15:35:48.857628image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Dataset B

2024-03-26T15:35:52.551074image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Dataset A

2024-03-26T15:35:49.503153image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Dataset B

2024-03-26T15:35:53.030991image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Dataset A

2024-03-26T15:35:50.258987image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Dataset B

2024-03-26T15:35:53.585796image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Dataset A

2024-03-26T15:35:47.611188image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Dataset B

2024-03-26T15:35:51.483196image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Dataset A

2024-03-26T15:35:48.244391image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Dataset B

2024-03-26T15:35:52.063447image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Dataset A

2024-03-26T15:35:48.987980image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Dataset B

2024-03-26T15:35:52.647289image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Dataset A

2024-03-26T15:35:49.628047image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Dataset B

2024-03-26T15:35:53.124286image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/

Missing values

Dataset A

2024-03-26T15:35:50.438617image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
A simple visualization of nullity by column.

Dataset B

2024-03-26T15:35:53.716227image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
A simple visualization of nullity by column.

Dataset A

2024-03-26T15:35:50.696360image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Dataset B

2024-03-26T15:35:53.903582image/svg+xmlMatplotlib v3.8.3, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

Dataset A

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
73273302Knight, Mr. Robert JmaleNaN002398550.0000NaNS
62662702Kirkland, Rev. Charles Leonardmale57.00021953312.3500NaNQ
24524601Minahan, Dr. William Edwardmale44.0201992890.0000C78Q
79479503Dantcheff, Mr. Ristiumale25.0003492037.8958NaNS
47647702Renouf, Mr. Peter Henrymale34.0103102721.0000NaNS
69069111Dick, Mr. Albert Adrianmale31.0101747457.0000B20S
70570602Morley, Mr. Henry Samuel ("Mr Henry Marshall")male39.00025065526.0000NaNS
16116212Watt, Mrs. James (Elizabeth "Bessie" Inglis Milne)female40.000C.A. 3359515.7500NaNS
79579602Otter, Mr. Richardmale39.0002821313.0000NaNS
75375403Jonkoff, Mr. Laliomale23.0003492047.8958NaNS

Dataset B

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
64464513Baclini, Miss. Eugeniefemale0.7521266619.2583NaNC
44544611Dodge, Master. Washingtonmale4.00023363881.8583A34S
46646702Campbell, Mr. WilliammaleNaN002398530.0000NaNS
26526602Reeves, Mr. Davidmale36.0000C.A. 1724810.5000NaNS
63763802Collyer, Mr. Harveymale31.0011C.A. 3192126.2500NaNS
23723812Collyer, Miss. Marjorie "Lottie"female8.0002C.A. 3192126.2500NaNS
53153203Toufik, Mr. NaklimaleNaN0026417.2292NaNC
60360403Torber, Mr. Ernst Williammale44.00003645118.0500NaNS
81681703Heininen, Miss. Wendla Mariafemale23.0000STON/O2. 31012907.9250NaNS
46947013Baclini, Miss. Helene Barbarafemale0.7521266619.2583NaNC

Dataset A

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
434412Laroche, Miss. Simonne Marie Anne Andreefemale3.012SC/Paris 212341.5792NaNC
27227312Mellinger, Mrs. (Elizabeth Anne Maidment)female41.00125064419.5000NaNS
64965013Stanley, Miss. Amy Zillah Elsiefemale23.000CA. 23147.5500NaNS
41841902Matthews, Mr. William Johnmale30.0002822813.0000NaNS
21121212Cameron, Miss. Clear Anniefemale35.000F.C.C. 1352821.0000NaNS
66066111Frauenthal, Dr. Henry Williammale50.020PC 17611133.6500NaNS
17717801Isham, Miss. Ann Elizabethfemale50.000PC 1759528.7125C49C
78178211Dick, Mrs. Albert Adrian (Vera Gillespie)female17.0101747457.0000B20S
12112203Moore, Mr. Leonard CharlesmaleNaN00A4. 545108.0500NaNS
62562601Sutton, Mr. Frederickmale61.0003696332.3208D50S

Dataset B

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
60460511Homer, Mr. Harry ("Mr E Haven")male35.00011142626.5500NaNC
23123203Larsson, Mr. Bengt Edvinmale29.0003470677.7750NaNS
55355413Leeni, Mr. Fahim ("Philip Zenni")male22.00026207.2250NaNC
77477512Hocking, Mrs. Elizabeth (Eliza Needs)female54.0132910523.0000NaNS
57057112Harris, Mr. Georgemale62.000S.W./PP 75210.5000NaNS
12612703McMahon, Mr. MartinmaleNaN003703727.7500NaNQ
15015102Bateman, Rev. Robert Jamesmale51.000S.O.P. 116612.5250NaNS
57357413Kelly, Miss. MaryfemaleNaN00143127.7500NaNQ
34734813Davison, Mrs. Thomas Henry (Mary E Finck)femaleNaN1038652516.1000NaNS
86686712Duran y More, Miss. Asuncionfemale27.010SC/PARIS 214913.8583NaNC

Duplicate rows

Dataset A

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked# duplicates
Dataset does not contain duplicate rows.

Dataset B

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked# duplicates
Dataset does not contain duplicate rows.